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Shopping of day-to-day items and keeping track of the shopping list can be a 
tedious and a time-consuming procedure, especially if it has to be done 
frequently. mySmartCart is a mobile application design proposed to 
transform the traditional way of writing a shopping list to a digitalized smart 
list which implements voice recognition and handwriting recognition for 
processing the natural language input of the user. The system design 
comprises four modules: i) input- which takes voice and handwritten list 
image input from the user; ii) processing- natural language processing of 
input data and converted to digital shopping list; iii) classification-list items 
classified into respective categories using machine learning algorithms; iv) 
output - searching on e-commerce applications and adding to shopping cart. 
The design proposed utilizes natural languages to communicate with the user 
thus enhancing their shopping experience. Google cloud speech recognition 
and Tesseract optical character recognition (OCR) for natural language 


processing have been utilized in the prototype along with support vector 


Speech recognition machine classifier for categorization. 


Support vector machine 
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1. INTRODUCTION 

Shopping for day-to-day supplies is an important aspect of one's life, and is a rather time consuming 
and tedious process. The traditional way of shopping involves handwritten shopping lists and making mental 
notes [1], [2], which are ineffective and result in time and money loss. The adoption of technology in human 
lives has enabled online mode of shopping resulting in its wide usage irrespective of the industry domain. 
Domestic requirements have been met through several web and mobile based applications for grocery 
shopping along with the traditional offline mode. There is always a consistent need in improving the solution 
for betterment of people's constant grocery shopping demand. Applications consisting of text and speech 
recognition are smart solutions that are trending in the current technological era. 

Semantic meaning associated with the speech is predominantly influenced by the manner of 
speaking, articulation, modulation, elocution and pronunciation by the speaker. Virtual speakers use 
synthesizers based on geographical perspective. Smart applications based on audio input use voice and 
speech recognition algorithms for better user experience and service. Although the terms voice recognition 
and speech recognition are often used interchangeably, voice recognition is primarily concerned with 
determining a speaker's identity rather than the content of their speech [3], whereas speech recognition is the 
process of converting the sound of words or phrases spoken by humans into electrical signals with meaning 
attached to it [4]. Individual users having different handwriting hence having unique strokes in various 
languages is a challenge for automation tasks. Optical character recognition (OCR) involves detecting the 
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text from embedded images, text documents and converting them to digital text [5]. Several handwriting text 
recognition techniques have been proposed for camera based whiteboard writing detection [6], document 
image text detection [7], and digitization of handwritten notes. 

In this work, a system design prototype which is a mobile application called mySmartCart is 
presented. The goal of the design is to enable users to create a shopping list using voice commands and/or by 
uploading the image of a handwritten shopping list, displaying options from various e-commerce websites 
from which users can select and add the items to their respective e-carts. The design implements voice 
recognition and OCR algorithms for processing natural languages and a machine learning algorithm for 
classifying the list items. In section 2 of this paper, we analyze existing works on smart shopping, 
applications of voice and handwriting recognition, and classification algorithms that can be implemented in 
our system. The proposed system methodology is explained in section 3 of the paper. The results of our 
analysis are presented in section 4 followed by conclusion. 


2. RELATED WORKS 

In this section, state-of-the-art conventional approaches adopted for voice recognition, OCR and 
classification algorithms are discussed and compared. The advantages and disadvantages of various existing 
application programming interfaces (API) for natural language processing involving voice and OCR are 
identified. Multiple machine learning classification algorithms that can be used to classify the shopping list 
items are also discussed in this section to identify their efficacy in segregation of chosen products. 


2.1. Voice recognition 

Voice recognition technology allows computers to recognise and translate spoken words into text, 
with the primary benefit of searchability. Automatic speech recognition (ASR) or speech to text (STT) are 
other coined terms for the same thing. Supervised training-based solutions are incorporated with voice 
samples to aid the physically challenged, augmented reality applications, smart shopping to provide an ease 
of use for all stake as shown in Table 1. 


Table |. Literature survey on voice recognition APIs 


SI No. Technology Advantages Disadvantages 
1 Google STT API [8]-[14] Enables voice input Result quality is not good. 
for queries Speaker identification algorithm does 
Speech is converted not exist 
in real time 
2 Watson speech to text (STT) [15]-[17] Organized transcripts | Very few languages are supported. 


Memory required is more due to 
Corpus-based Text to storage of whole phrases. 


Speech (TTS) 
3 Speech-To-Text (STT) based on Mel- Supports multiple Training database is self-generated. 
frequency cepstral coefficients [18]-[20] languages. Generating databases for multiple 
SNR is less, which languages is time consuming 
leads to better Difficult to implement new languages 
accuracy. 
4 Mozilla’s Deep Speech Embedded engine Poor performance 
[21]-{23] Higher accuracy High memory consumption 


Open source 


2.2. Optical character recognition 

Digitization of documents comprising text, images, videos is achieved with the help of OCR 
algorithms for text extraction. Recognizing text elements in a document can be used in various fields like 
shopping, handwriting recognition, and medical field. Multiple languages can also be recognised using the 
OCR approach as depicted in Table 2. 


2.3. Classification algorithm 

Supervised approach is used to identify the category of new observations based on training data. 
These algorithms learn from the data they're given and trained. It empowers artificial intelligence to the 
machine and facilitates smart applications. Table 3 indicates the popular classification algorithm adopted for 
classifying the texts. 
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Table 2. Optical character recognition (OCR) APIs and their usage 


SINo. Technology Advantages Disadvantages 
1 Optical Character Recognition High accuracy Doesn't work well with partial 
Tesseract OCR [24], [25] Free occlusion, warped perspective, 
Data security or a complicated background. 
2 Recurrent Neural Network (RNN) Disseminate data over longer distances Vanishing gradients 
[26]-[28] Provides strong preparation features Learning large data series is 
challenging 
3 Convolutional neural network Increase text identification performance in Overfitting model 
(CNN) [29] whiteboard and handwritten note scenario Explosive gradients 
Can be used to train any handwritten Class imbalance 
language 
Has good accuracy 
4 Bidirectional associative memory Completes the heteroassociation challenge by Recognising difficult patterns 
(BAM) [30] achieving a steady state in a repeating pattern _is a challenge. 

5 Raspberry Pi [31], [32] The Raspberry Pi is the project's major Even though the mistake rate is 
implementation goal since it acts as a bridge modest, it oneeds to be 
between the camera, sensors, and image reduced. 
processing results, as well as giving functions 
to change peripheral equipment. (Keyboard 
and USB) 

6 OpenCV [33] Free of cost Lower accuracy and battery 
Supports multiple languages performance needs to be 

improved 

7 Directed Acyclic Graph Handling of the hand-crafted features Overfitting model 

Convolutional Neural Network problem is done. 93% accuracy achieved. 
(DAG-CNN) [34] 
Table 3. Literature survey on classification algorithm 
SI No. Technology Advantages Disadvantages 
1 Discriminative patches, Feature extraction takes an average of 27 While classifying, patch detectors 
Support Vector Machine seconds consume the most amount of time 
(SVM) [35] 
2 Naive Bayes [36]-[38] Very good learning speed. It is the fastest. Least amount of accuracy obtained 


3 k-nearest neighbors (KNN) 


[37]-[39] 


4 SVM [36]-[40] 


Least learning period, good performance 


Highest accuracy, good classification speed, 
noise tolerant, performance with missing 
values and non-relevant features 


when compared to others 

Highest computational cost, worst 
tolerance with parity problem 
Learning duration is long, worst 
classification prediction, performance 
degrades with large datasets 


3. PROPOSED SMARTCART METHODOLOGY 

mySmartCart is a mobile application idea developed to ease the online shopping experience of a 
user. The project design prototype implements six modules as shown in Figure 1. The proposed system 
architectural design comprises various stages like input through user interface, processing followed by 
pre-processing user input, digital list comprising user shopping cart items, classifier to transform digital list 
to a smart list followed by search to a suitable e-commerce application displayed to the user. 


INPUT INPUT PRE-PROCESSING PROCESSING 
USER INTERFACE Input Pre-Process Process 
Voice input or Handwritten list image Pre-Process AudioData() VoiceRecognition() 
Choose Method Of Input 
Call RecognitionAlgorithm() Pre-Process ImageData() OpticalCharacterRecognition() 
9° 
$ 
3 
a 
SEARCH 
DISPLAY 
Notify Search E-commerceDatabase() CLASSIFIER DIGITAL LIST 
Items added Search Classify 
DisplayChoices() Items DisplayShoppingList() 
Notification() 
Add items to cart() Classification() Call ClassificationAlgorithm() 


Figure 1. Proposed architecture of mySmartCart 


3.1. Input given to the application 
The prototype takes in an input shopping list from the user in 2 ways-with voice input or 
handwritten text that will be uploaded by clicking a picture of the shopping list. This input is sent to the pre- 
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processing module to improve accuracy of voice recognition and OCR. Voice input is fed through a 
microphone and the handwritten text or digital list is captured with any camera and uploaded to the 
application. 


3.2. Input pre-processing 

Audio input of the shopping list is recorded from the user and converted into spectrograms from 
which meaningful features are extracted. We then pre-process the data which entails comma separated values 
(CSV) data loading, label-encoding, feature-scaling, and data segmentation into training and test sets. Pre- 
processing of image data for OCR includes scaling of image, correcting the skewness of image, binarization 
and noise removal. The pre-processed data is fed to the processing module. 


3.3. Processing of voice/text data 

Once pre-processed data is available it is fed to the voice recognition/OCR algorithm to process the 
input and to identify the items for the digital shopping list. Voice is the most common means of 
communication. Natural language processing is performed to analyze and find the best fitting word that suits 
the audio. Some examples of voice recognition APIs are Google Speech API, IBM Watson API, Amazon 
Speech to Text API, Rev.AI, Siri API. The electronic translation of typed, handwritten, or printed text images 
into machine-encoded text is known as OCR. Artificial neural networks such as convolutional neural network 
(CNN), recurrent neural network (RNN), convolutional recurrent neural network (CRNN) and Tesseract 
OCR, are used for text recognition. Performance analysis of these methods are discussed in section 4. 


3.4. Digital list 

The items in the shopping list are generated after their identification using the natural language 
processing module. The list thus obtained is confirmed with the user to facilitate product classification. It 
enables the creation of a smart shopping list by enriching user experience. The digital list is then fed to the 
classifier for segregation that enables for smart recommendation of e-commerce sites. 


3.5. Classifier 

Supervised machine learning technique is adopted to classify the items on the shopping list as 
grocery or non-grocery items. Machine learning algorithms such as support vector machine classifier, Naive 
Bayes Classifier, random forest classifier is used for classifying the items in the shopping list. Traditional 
support vector machine and logistic regression are the two most popular techniques in machine learning for 
predictive analysis [37]. 


3.6. Search 

Once a digital list is converted to a smart list after classification, the items in the smart list are 
searched through web scraping in e-commerce applications. ex: Amazon and SnapDeal for non-grocery 
items; Grofers and BigBasket for grocery items. The product details obtained from the e-commerce sites 
along with the prices are stored and sent to the display for user shopping choice. The user need not surf 
multiple websites for information about any product but can get the desired filtered result at one place to 
make a decision. 


3.7. Display 

The results of search made for the items in the smart list are made available to the user for review. 
The user can choose the e-commerce site based on the feedback and price details. Users can remove any item 
from the smart list if not interested to buy based on search results. Once the items are moved to the cart in the 
respective e-commerce site they can proceed with the purchase in a normal fashion. 


4. RESULTS AND DISCUSSIONS 

The findings of our investigation are discussed in this section. The comparison of available Voice 
Recognition APIs is shown in Table 4. Rev.AI has the highest accuracy, however it only supports 26 
languages. Google Speech API has an accuracy rate of 84 percent and is open source. 

The comparison results of different handwritten text recognition methods are shown in Table 5. 
Tesseract OCR, a Google open-source engine has highest recognition accuracy of above 92% and leads in 
usage due to its free availability. Convolutional recurrent neural network (CRNN), a deep learning model 
with an accuracy of 88%, comes in second. The comparison of various machine learning classification 
techniques used for binary text categorization is shown in Table 6. The Naive Bayes and support vector 
machine algorithms have the highest percentage of correctly identified data. Based on the results obtained, 
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we adopt the support vector machine due its high efficiency rate in text classification and is suitable for short 
text classification. 


Table 4. Comparative analysis of popular Speech recognition APIs 


API Accuracy Languages Supported Price 
Google Speech API 84% 125+ $0.036/minute 
Amazon STT 73% 30 $0.024/minute 
Azure STT 78% 100+ Free and Paid($1/hour) 
Rev.AI 86% 26 $0.035/minute 
IBM Watson API 771% 13 Free and paid $0.01/minute 
Google Cloud API 79% 80 Free and paid $0.024/minute 


Table 5. Efficiency comparison of various OCR algorithms 


Name of algorithm Accuracy Language supported 
Tesseract OCR 70.2% -92.9% 100+ 
OCR with Recurrent Neural Network (RNN) 87 Can train any language 
OCR with Convolution Neural Network (CNN) 715 Can train any language 
OCR with Convolutional Recurrent Neural Network (CRNN) 88 80+ 


Table 6. Comparison between various text classification algorithms 


Classifier Time (sec) Correctly classified (%) Incorrectly classified (%) MAE 
SVM 0.09 77 23 0.226 
Naive Bayes 0.03 76 24 0.284 
Decision table 0.23 72 28 0.341 
Random Forest 0.55 74 26 0.26 


5. CONCLUSION 

In this work, a comparative study of various natural language processing algorithms was carried out 
to determine the efficacy of them to utilize them in generating smart shopping lists. Among the speech 
recognition systems, the PLP based approach used by Google is found to be effective. OCR using the LSTM 
module by Tesseract engine utilizes line finding algorithms enhancing the identification of handwritten text. 
Classification of the text based on its semantics for smart shopping is found to be best with support vector 
machine classifiers. It performs better in terms of time complexity and accurate error measure. 
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